Skip to main content

Redshift

Querying Overview

Redshift is a SQL database from AWS which is generally compatible with standard Postgres. A discussion on compatibility can be found at https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html

A discussion of the SQL interface to Redshift can be found at https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_SQLCommandRef.html

Querying Context

Database

The Qarbine administrator sets the database of the Qarbine data service when defining the data service. You cannot use a command like USE DATABASE (as in Postgres) to switch databases mid-session. All queries in that session can only access objects (schemas, tables, etc.) within that database. To access another database, you must create a new connection specifying that database.

To see the databases managed by the server you can still run

show databases

Schema

The Qarbine administrator may set a schema value in the data service definition. With it set the schema referenced becomes the first element in the search path. By default, the search path is "$user", public, where "$user" is replaced by the current session user's schema, if it exists.

The Redshift search path is a session-level setting that determines the order in which schemas are searched when you reference a database object (such as a table or function) by an unqualified name (i.e., without specifying the schema). This mechanism allows you to control which schema Redshift looks in first when resolving object names, which is especially useful if you have objects with the same name in different schemas.

To see the effective search path run

show search_path

A sample result is shown below.

  

In this example the data service has a data option of “schema=tickit”. If a table is not found in any schema in the search path, you must use a qualified name (e.g., schema_name.table_name). For more information see https://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html.

With no schema configured retrieving from the category table requires a schema prefix such as

select *  from tickit.category

With the “tickit” schema configured retrieving from the category table can simply be

select *  from category

Referencing Tables

The table reference sample_data_dev.tickit.sales in Amazon Redshift consists of three elements, each specifying a level of organization within the database:

ElementDescription
sample_data_devThe database name. This is the database you are connected to or referencing.
tickitThe schema within the database. Schemas group related tables and other database objects.
salesThe table name. In this case, it refers to the sales fact table in the TICKIT sample schema.

General Data Handling

In general primitive datatypes such as numeric, strings, and booleans are in the answer set as their expected JavaScript values. A discussion of data types can be found at
https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html

Handling JSON Data

Redshift does not have a native JSON or JSONB data type as in PostgreSQL. Instead, JSON and other semi-structured data are stored in the SUPER type, which is designed for this purpose. SUPER columns can store complex, nested JSON documents, arrays, and objects. However, the “JSON” row values returned from Redshift are simple JSON strings and not JSON objects. To convert the strings into genuine JSON objects use

#pragma convertToObjects CSV_list_of_fields

A discussion of Redshift JSON related functions can be found at https://docs.aws.amazon.com/redshift/latest/dg/json-functions.html

Consider this table which uses a SUPER column to store JSON strings.

CREATE TABLE user_profiles (
user_id INT,
profile SUPER
);

In this table definition you can store different attributes for different users without altering the table schema. You can insert JSON data using the JSON_PARSE() function to convert a JSON string into the SUPER type:

INSERT INTO user_profiles (user_id, profile) VALUES
(1, JSON_PARSE('{"name": "Alice", "age": 30, "skills": ["SQL", "Python"]}')),
(2, JSON_PARSE('{"name": "Bob", "skills": ["R", "SQL"], "location": "London"}')),
(3, JSON_PARSE('{"name": "Charlie", "hobbies": ["guitar", "skiing"], "age": 28}')),
(4, JSON_PARSE('{"username": "daisy99", "signup_date": "2025-01-01"}'));

Running this query in Redshift Studio

SELECT user_id, profile
FROM user_profiles;

shows

  

The result in the Data Source Designer is shown below.

  

As expected the user_id is a number and the profile a string.

Running the following query extracts specific fields from the profile

SELECT  user_id,  profile.name AS name, profile.age as age
FROM user_profiles;

The result in Redshift Studio is

  

The result in Qarbine is

  

Such SQL is fine if the JSON contains fairly similar content. In some cases this is certainly not the case and the author of the SQL may not know all the possibilities or the data may evolve and a previously satisfactory SQL statement will later miss some important new data. Here is a query that knows a lot about the profile JSON content.

SELECT  user_id, profile.name AS name,  profile.age AS age, profile.skills[0] AS primary_skill,
profile.location AS location, profile.hobbies[0] AS primary_hobby,
profile.username AS username, profile.signup_date AS signup_date
FROM user_profiles;

The answer set is shown below.

user_id name age primary_skill location primary_hobby username signup_date
1Alice30SQLNULLNULLNULLNULL
2BobNULLRLondonNULLNULLNULL
3Charlie28NULLNULLguitarNULLNULL
4NULLNULLNULLNULLNULLdaisy992025-01-01

Vital considerations include how to handle multiple hobbies and skills and the potential explosion of the answer set size if arrays are unnested. A solution is to simply have the answer set contain genuine JSON objects.

Manipulating Row Shape

Qarbine can be directed to further modify the answer set elements This is done using ‘pragmas” which are discussed in the general Data Source Designer documentation. The most common pragmas likely to be used are “”convertToObject CSV_list_of_fields” and “pullFieldsUp CSV_list_of_fields”. For example, this query

SELECT user_id, profile
FROM user_profiles;

returns this answer set.

  

Here the profile column is a SUPER column containing JSON strings.

Adding a pragma the query specification is now

#pragma convertToObject profile
SELECT user_id, profile
FROM user_profiles;

The new answer set is shown below.

  

The profile field values are now JSON objects. The details of the first row are shown below.

  

The first pragma converts the profile string value into an object and stores the result in the field named “profile”. Template formulas can then reference fields using either

@current.profile.someField

or

#profile.someField

Multiple pragma can be sequenced together. For example, this query specification

#pragma convertToObject profile
#pragma pullFieldsUp profile
SELECT user_id, profile
FROM user_profiles;

returns this answer set.

  

In this example we also move the child fields of profile upward to the main level using the 2nd pragma. The fields that were previously within the user_profile object have been pulled up a level. This makes it more convenient to reference field values in a template formula. Template formulas can then reference fields using either

@current.someField

or

#someField

Array Handling

In Amazon Redshift, you can store and query arrays using the SUPER data type, which supports storing arrays as part of JSON-like documents.Consider this table definition.

CREATE TABLE user_skills (
user_id INT,
skills SUPER
);

It is populated using

INSERT INTO user_skills (user_id, skills) VALUES
(1, JSON_PARSE('["SQL", "Python", "Redshift"]')),
(2, JSON_PARSE('["R", "SQL"]')),
(3, JSON_PARSE('["Java", "Scala", "Spark", "Python"]'));

Running this query

SELECT user_id, skills
FROM user_skills;

returns the answer set

  

Applying a pragma

#pragma convertToObject skills
SELECT user_id, skills
FROM user_skills;

returns this answer set.

  

The first row details are

  

Unnesting Arrays

Here is a query that “unnests” array values.

SELECT user_id, value AS skill
FROM user_skills,
LATERAL FLATTEN(input => skills);

For each element in the array, an answer set row is created. This can lead to rapidly increasing answer set size with a lot of redundant data because of the array element multiplier.

user_id skill
1SQL
1Python
1Redshift
2R
2SQL
3Java
3Scala
3Spark
3Python

The preferred approach is to not unnest arrays at all and interact with the JSON objects in their natural form. This is Qarbine’s preferred approach.

System Tables

The query

SELECT * FROM information_schema.tables

does not show any tables in Amazon Redshift because Redshift does not fully support the standard information_schema views as in PostgreSQL or other relational databases. Redshift provides its own set of system tables and views for metadata, and the information_schema is either incomplete or not populated in the same way.

For more information see
https://docs.aws.amazon.com/redshift/latest/gsg/t_querying_redshift_system_tables.html
https://docs.aws.amazon.com/redshift/latest/dg/c_sampledb.html

Metadata Source Works in Redshift Query Editor Works in Node.js Client Recommended?
Information_schema.tablesYes (sometimes, not reliable)NoNo
PG_TABLE_DEFYesYesYes
SVV_TABLESYesYesYes

Sample Data Sets

Redshift provides several sample data sets. The TICKIT sample data set that Amazon Redshift documentation examples use can be loaded using the information on . Its relationships are shown below.

  

More information on the can be found at https://docs.aws.amazon.com/redshift/latest/dg/c_sampledb.html

Troubleshooting

If errors occur while using Qarbine then a good course of action is to run the query using your traditional Postgres tools. This can range from psql or PGAdmin for example.

The Query Profiler in the Redshift console lets you analyze query execution plans, identify bottlenecks, and see where time is spent (e.g., on nested loop joins or data distribution steps).

Reference

See this page for more information https://docs.aws.amazon.com/redshift/